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Introduction 

Set-functions appear in many areas of computer science and applied mathematics, such as 
machine learning [H [21 El E], computer vision [5j [6], operations research [7] or electrical 
networks [8|. Among these set-functions, submodular functions play an important role, 
similar to convex functions on vector spaces. In this tutorial, the theory of submodular 
functions is presented, in a self-contained way, with all results shown from first principles. 
A good knowledge of convex analysis is assumed (see, e.g., [9| [TO] ) . 

Several books and tutorial articles already exist on the same topic and the material presented 
in this tutorial rely mostly on those [1 H [HI [T2l 113] . However, in order to present the material 
in the simplest way, ideas from related research papers have also been used. 

Notation. We consider the set V = {1, . . . ,p}, and its power set 2 V , composed of the 
2 P subsets of V. Given a vector s € M p , s also denotes the modular set-function defined 
as s(A) = Ylk£A s k- Moreover, A C B means that A is a subset of B, potentially equal 
to B. For q € [l,+oo], we denote by \\w\\ q the £ g -norm of w, by \A\ the cardinality of 
the set A, and, for A C V = {1, . . . ,p}, 1a denotes the indicator vector of the set A. If 
w E MP, and a £ E, then {w ^ a} (resp. {w > a}) denotes the subset of V = {1, . . . , p} 
defined as {k € V, Wk ^ a} (resp. {k G V, > a}). Similarly if v G MP, we have 
{ w J> v } = {k € V, w k ^ v k }. 



Tutorial outline. In Section [H we give the different definitions of submodular functions 
and of the associated polyhedra. In Section [21 we define the Lovasz extension and give 
its main properties. Associated polyhedra are further studied in Section [3l where support 
functions and the associated maximizers are computed (we also detail the facial structure 
of such polyhedra). In Section [U we provide some duality theory for submodular functions, 
while in Section [SJ we present several operations that preserve submodularity. In Section [6l 
we consider separable optimization problems associated with the Lovasz extension; these are 
reinterpreted in Section [7| as separable optimization over the submodular or base polyhedra. 
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In Section [8l we present various approaches to submodular function minimization (without 
all details of algorithms). In Section [9[ we specialize some of our results to non-decreasing 
submodular functions. Finally, in Section fl~0| we present classical examples of submodular 
functions. 
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1 Definitions 



Throughout this tutorial, we consider V = {l,...,p}, p > and its power set (i.e., set 
of all subsets) 2 , which is of cardinality 2 P . We also consider a real- valued set-function 
F : 2^ — > R such that F{0) = 0. As opposed to the common convention with convex 
functions, we do not allow infinite values for the function F. 

Definition 1 (Submodular function) A set-function F : 2 V — > R is submodular if and 
only if, for all subsets A, B C V, we have: F(A) + F(B) ^ F(A U B) + F(A n B). 

The simplest example of submodular function is the cardinality (i.e., F(A) = \A\ where 
| -A | is the number of elements of A), which is both submodular and supermodular (i.e., its 
opposite is submodular), which we refer to as modular. 

From Def. [H it is clear that the set of submodular functions is closed under addition and 
multiplication by a positive scalar. The following proposition shows that a submodular 
has the "diminishing return" property, and that this is sufficient to be submodular. Thus, 
submodular functions may be seen as a discrete analog to concave functions. However, in 
terms of optimization they behave more like convex functions (e.g., efficient minimization, 
duality theory, linked with convex Lovasz extension). 

Proposition 1 (Equivalent definition with first order differences) F is submodular 
if and only if for all A, B C V and k G V, such that A C B and k £ B, we have 
F(A U {k}) - F{A) > F(B U {k}) - F{B). 

Proof Let A C B, and k <£ B, F(A U {k}) - F(A) - F(B U {k}) + F(B) = F(C) + F(D) - 
F(C U D) - F(C n D) with C = A U {k} and D = B, which shows that the condition is 
necessary. To prove the opposite, we assume that the condition is satisfied; one can first 
show that if A C B and C n B = 0, then F(A U C) - F(A) ^ F(B U C) - F(B) (this can 
be obtained by summing the m inequalities F(A U {ci, . . . , c^}) — F(A U {ci, . . . , Cfc_i}) ^ 
F(B U {ci, . . . , c fc }) - F(B U {ci, . . . , c fc _!}) where C = {a, . . . , c m }). 

Then for any 1,7 CV, take A = X n Y, C = X\Y and B = Y to obtain F(X) + F(Y) ^ 
F{X U Y) + i^(X n Y"), which shows that the condition is sufficient. ■ 



The following proposition gives the tightest condition for submodularity (easiest to show in 
practice) . 

Proposition 2 (Equivalent definition with second order differences) F is submod- 
ular if and only if for all A C V and j, k € V\A, we have F(A U {A;}) — F(A) ^ 
F(Au{j,k})-F(AU{j}). 
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Proof This condition is weaker than the one from previous proposition. To prove that 
it is still sufficient, simply apply it to subsets A U {b±, . . . , 6 s _i}, j = b s for B = A U 
{61, . . . , b m } D A with k B, and sum the m inequalities F(A U {61, . . . , b s -i} U {A;}) — 
F(A U {61, . . . , ) F(A U {61, . . . , b s } U {&}) - F(A U {61, . . . , 6 S }), to obtain the 

condition in Prop. HJ ■ 



A vector s G M p naturally leads to a modular set-function defined as s(A) = X^fceA s k = 
s t 1a, where 1a £ K p is the indicator vector of the set A. We now define specific polyhedra 
in W. These play a crucial role in submodular analysis, as most results may be interpreted 
or proved using such polyhedra. 

Definition 2 (Submodular and base polyhedra) Let F be a submodular function such 
that F(0) = 0. The submodular polyhedron P(F) and the base polyhedron B{F) are defined 
as: 

P(F) = {s e M. p , WL c V, s(A) < F(A)} 

B{F) = {s G R p , s(V) = F(V), VA C V, s{A) ^ F(A)} = P(F) n {s(V) = F(V)}. 

As shown in the following proposition, the submodular polyhedron P(F) has non empty- 
interior and is unbounded. Note that the other polyhedron (the base polyhedron) will be 
shown to be non-empty and bounded as a consequence of Prop. It has empty interior 
since it is included in the subspace s(V) = F(V). See Figure Q] for examples with p = 2 and 
p = 3. 

Proposition 3 (Properties of submodular polyhedron) Let F be a submodular func- 
tion such that F (0) = 0. If s G P(F), then for all t G W, such that t < s, we have t G P(F). 
Moreover, P(F) has non-empty interior. 

Proof The first part is trivial, since t{A) ;C s(A) if t ^ s. For the second part, we only 
need to show that P(F) is non-empty, which is true since the constant vector equal to 
minAcV, A+z ^jxj- belongs to P{F). ■ 



2 Lovasz extension 

We consider a set-function F such that F(0) = 0, which is not necessary submodular. We 
can define its Lovasz extension [H], which is often referred to as its Choquet integral |15j . 
The Lovasz extension allows to draw links between submodular set-functions and regular 
convex functions, and transfer known results from convex analysis, such as duality. 

Definition 3 (Lovasz extension) Given a set-function F such that F{0) = 0, the Lovasz 
extension f : MP — > K is defined as follows; for w G MP, order the components Wj 1 ^ • • • ^ 
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Figure 1: Submodular polyhedron P(F) and base polyhedron B(F) for p = 2 (left) and 
p = 3 (right), for a non-decreasing submodular function. 



Wj p , and define f{w) through any of the following equations: 



p 



f(w) = w jl F jl +J2w jk [F({ji,...,j k })-F({j 1 ,...,j k - 1 })], (1) 

k=2 

p-1 

= Y, F ^---,jk})(w 3k -w 3k+1 ) + F{V)w Jp , (2) 

k=l 

oo 

F({w ^ z})dz + F(V) mm{w u w p }, (3) 

min{-u) w p } 
+oo rO 

F({w > z})dz + / [F({w ^ z}) - F(V)]dz. (4) 

./-oo 

Proof To prove that we actually define a function, one needs to prove that the definition 
is independent of the non unique ordering Wj 1 ^ • • • ^ w 3 - , which is trivial from the 
last formulation in Eq. Q. The first and second formulations in Eq. ([1]) and Eq. ([2]) are 
equivalent (by integration by parts, or Abel summation formula). To show equivalence 
with Eq. (|3|), one may notice that z i— > F({w ^ z}) is piecewise constant, with value 
zero for z > Wj 1 = max{wi,...,w p }, and equal to F({j±, . . . ,jk}) for z G (wj k+1 ,Wj k ), 
k = {1, . . . ,p — 1}, and equal to F(V) for z < Wj p = min{^i, . . . , w p }. What happens at 
break points is irrelevant for integration. 

To prove Eq. @, notice that for a ^ min{0, Wi, . . . , w p }, Eq. ([3]) 

r+oo /■min{wi,...,to p } 

f(w) = / F{{w^z})dz- F({w > z})dz + F(V)mm{w 1 ,...,w p } 

J a J a 

oo /•min{wi,...,w p } /•min{wi,...,w p } 

F{{w ^ z})dz - / F(V)dz + / F{V)dz 

J a JO 

oo rO 

F({w ^ z})dz - / F(V)dz, 
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and we get the result by letting a tend to — oo. 



Note that for modular functions A i— >■ s(A), with s G R p , then the Lovasz extension is 
the linear function w i— >■ ■u; T s. The following proposition details classical properties of the 
Choquet integral. Property (e) below implies that the Lovasz extension is equal to the 
original set-function on {0, 1} P (which can canonically be identified to 2 ), and hence is 
indeed an extension of F. 

Proposition 4 (Properties of Lovasz extension) Let F be any set-function such that 
F{0) = 0. We have: 

(a) if F and G are set-functions with Lovasz extensions f and g, then f + g is the Lovasz 
extension of F + G, and for all X G R+, Xf is the Lovasz extension of XF, 

(b) for w G R p + , f(w) = / + °° F({w ^ z})dz, 

(c) ifF{V) = 0, for all w G W, f(w) = /+~ F({w > z})dz, 

(d) for all w G R p and a e R, f{w + cdy) = f(w) + aF(V), 

(e) the Lovasz extension f is positively homogeneous, 

(f) for all ACV, F(A) = f(l A ), 

(g) if F is symmetric (i.e., \/A C V, F{A) = F(V\A)), then f is even, 

(h) if V = A\ U • • • U A m is a partition of V , and w = Y^ILx v i^Ai (i-e., is constant on each 
set Ai), with v\ ^ • • • > v m , then f(w) = Ya^i i v i ~ Vi+i)F(A 1 U • • • U Afi + v i+ iF(V). 

Proof Properties (a), (b) and (c) are immediate from Eq. ([!]) and Eq. ([2]). (d), (e) and 
(f) are straightforward from Eq. ([2|). If F is symmetric, then F(V) = 0, and thus f(—w) = 
f-™F({-w > z})dzJ+™F({ W ^ -z})dz = J^F({w ^ z})dz = £™ F({w > z})dz = 
f(w) (because we may replace strict inequalities by regular inequalities), i.e., / is even. ■ 

Note that when the function is a cut function, then the Lovasz extension is related to the 
total variation and property (c) is often referred to as the co-area formula (see [16] and 
references therein, as well as Section [10. 2D . 

The next result relates the Lovasz extension with the support function of the submodular 
polyhedron P(F) which is defined in Def. [2j This is the basis for many of the theoretical 
results and algorithms related to submodular functions. It shows that maximizing a linear 
function with non-negative coefficients on the submodular polyhedron may be obtained in 
closed form, by the so-called "greedy algorithm" (see [H] for an intuitive explanation), and 
the optimal value is equal to the value f(w) of the Lovasz extension. Note that otherwise, 
solving a linear programming problem with 2 P constraints would then be required. 

Proposition 5 (Greedy algorithm) Let F be a submodular function such that F(0) = 
0. Let w G R^.. A maximizer o/max s6 p(p) w T s may be obtained by the following algorithm: 
order the components of w, as Wj 1 ^ ••• ^ wj p $ and define Sj k = F({j\, . . . ,jk}) — 
F({ji, . . . ,jk-i})- Moreover, for all w G R^, max se p(jr) w T s = f{w). 
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Proof By convex duality (which applies because F(F) has non empty interior from 
Prop. |3J), we have, by introducing Lagrange multipliers A^ G K + for the constraints 
s{A) sC F(A), AcV: 



max w T s = 

s&P(F) X A ^0 



min max \ w T s - \a\s(A) - F(A)} } 
v ACV ' 

= x 4 >o|ac v sgrp { E X ^ A ) + " E A ^)} 

AcV fc=l A3fc ^ 

= min > A^FfA) such that Vfe G V, w k = > A<a. 

If we take the (primal) solution s of the greedy algorithm, we have f(w) = w T s from 
Eq. dl]), and s is feasible (i.e., in F(F)), because of the submodularity of F. Indeed, 
without loss of generality, we assume that j k = k for all k G {1, ... ,p}. We can decompose 
A = A\ U • • • U A m , where A^ = (u k ,v k \ are integer intervals. We then have: 



s(A) = £{F((0,^])-F((0,^])} 

k=i 
m 

E { F (( u ii v k\) - F((ui,u k ])} by submodularity 
fc=i 

F(( Ul , Vl }) + {F(( Ul ,v k }) - F(( Ul ,u k })} 

k=2 
in 

F((ui,v{\) + {F((ui,vi\ U (u 2 ,v k ]) - F((tti,vi] U (u 2 ,Ujfc])} by submodularity 

fc=2 

m 

= F((m, ui] U (u 2 , ua]) + {F(( Ul , vi] U (m 2 , - F(( Ul , Vl ] U (u 2 , «*])}• 

fc=3 

By pursuing applying submodularity, we finally obtain that S(A) ^ F((ui, vi]U. . . (u m , v m ]) = 
F(A), i.e., s G F(F). 

Moreover, we can define dual variables A/j 1) = Wj fe — for G {1, . . . ,p — 1} and 

Ay = with all other Aa equal to zero. Then they are all non negative (notably because 
w ^ 0), and satisfy the constraint VA; G V, w k = J^Ask ^A- Finally, the dual cost function 
has also value f(w) (from Eq. ©)• Thus by duality (which holds, because P(F) is not 
empty), s is an optimal solution. Note that it is not unique (see Prop. [27] for a description 
of the set of solutions) . ■ 



The next proposition draws precise links between convexity and submodularity, by showing 
that a set-function F is submodular if and only if its Lovasz extension / is convex. This 
is further developed in Prop. [7] where it is shown that minimizing F on 2 V (which is 
equivalent to minimizing / on {0, 1} P since / is an extension of F) and minimizing / on 
[0, l] p is equivalent (when F is submodular). 

Proposition 6 (Convexity and submodularity) A s et-f unction F is submodular if and 
only if its Lovasz extension f is convex. 
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Proof Let A,BcV. The vector Iaub + ^AnB = ^A + Is has components equal to (on 
V\(AuB)), 2 (on AnB) and 1 (on AAB = (A\B)U(B\A)). Therefore, /(Uus + Uns) = 

J 2 F ( l {w>z}) dz = li F i A u B ) dz + Ii F ( A n B ) dz = F ( A + F(A n B). 

If / is convex, then by homogeneity, /(1a + 1b) ^ /(1a) + /(lfi) 5 which is equal to 
F(A) + F(B), and thus F is submodular. 

If F is submodular, then by Proposition [5] for all w € M+, f(w) is a maximum of linear 
functions, thus, it is convex on K^j_. Moreover, because f(w + aly) = f(w) + aF(V), it is 
convex onF. ■ 



The next proposition completes Prop. [6] by showing that minimizing the Lovasz extension 
on [0, l] p is equivalent to minimizing it on {0, 1} P , and hence to minimizing the set-function 
F on 2 V (when F is submodular). 

Proposition 7 (Minimization of submodular functions) Let F be a submodular func- 
tion and f its Lovasz extension; then min^cF F(A) = min^grQ f(w). 

Proof Because / is an extension from {0, 1} P to [0, l] p (property (d) from Proposition U]), 
then we must have min^cV F ( A ) = m ^ n we{o,i}p fi w ) ^ mm u;e[o,i]p f( w )- For the other in- 
equality, any w G [0, l] p may be decomposed as w = Yli=i ^A, where A\ C • • • C A v = V, 
where A is nonnegative and has a sum smaller than or equal to one (this can be ob- 
tained by considering Ai the set of indices of the i largest values of w). We then have 

/H = Ef=i fS^x" F ( A i) dz = ELi > Yh=i *i min AcV F(A) > mm AcV F(A) 

(because min^cF F(A) ^ 0). This leads to the desired result. ■ 



3 Support function of submodular and base polyhedra 

The next proposition completes Prop. [5] by computing the full support function of B(F) 
and P(F) (see [U [10] for definitions of support functions), i.e., computing max. sG B(F) w T s 
and max se p( f ) w T s for all possible w (with positive and/or negative coefficients). Note the 
different behaviors for B{F) and P(F). 

Proposition 8 (Support function of submodular and base polyhedra) Let F be a 

submodular function such that F{0) = 0. We have: 

(a) for all w € W p , max. seB ^ w T s = f(w), 

(b) ifweR p + , max s g P(F ) w T s = f(w), 

(c) if there exists j such that Wj < 0, then max s gp(^) w T s = +oo. 

Proof (a) From the proof of Prop. O for w € R5., then the result of the greedy algorithm 
satisfies s(V) = F(V), and hence (a) is true on For all w, for a large enough, w + 
aly ^ 0, and thus f{w) + aF{V) = f(w + aly) = niax seB ( f )(w + aly) 1 s = aF(V) + 
max sgB ( F ) w T s, i.e., (a) is true. 
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Property (b) is shown in Proposition For (c), notice that s(A) = sq — X8j G P(F) for 
A — > +00 and sq G an d that u; T s(A) — > +00. ■ 



The next proposition shows necessary and sufficient conditions for optimality in the defini- 
tion of support functions. Note that Prop. [5] gave one example obtained from the greedy 
algorithm, and that we can now characterize all maximizers. Moreover, note that the 
maximizer is unique only when w has distinct values, and otherwise, the ordering of the 
components of w is not unique, and hence, the greedy algorithm may have multiple out- 
puts (and all convex combinations of these are also solutions). The following proposition 
essentially shows what is exactly needed to be a maximizer. 

Proposition 9 (Maximizers of the support function of submodular polyhedron) 

Let F be a submodular function such that F(0) = 0. Let w € (M^_) p , with unique val- 
ues vi > ■ ■ ■ > v m > 0, taken at sets A±, . . . ,A m (i.e., V = A\ U • • • U A m and Vi € 
{1, . . . , m}, Vfc G Ai, Wk = Vi). Then s is optimal for max se p(p) w T s if and only if for all 
i = l,...,m, s(Ai U ••• U Ai) = F(Ai U • • • U^). 

Proof Let Bi = A± U ■ ■ ■ U Ai, for i = 1, . . . , m. From the optimization problems defined 
in the proof of Prop. El let Ay = v m > 0, and Ab 4 = Vi — > for i < m, with all other 
Xa, A C V, equal to zero. Such A is optimal (because the dual function is equal to f(w)). 

Let s 6 B(F). We have: 

m— 1 

Y,*aF(A) = v m F{V) + Y. F ^ Vi - Vi ^) 

AcV i=l 

m— 1 

= v m (F(V)-s(y)) + ^2[F(Bi)-s(Bi)}(vi-v i+1 ) 

i=i 

m— 1 

+v m s(V) + ^2 s(Bi)(vi - Vi+i) 

i=l 

m—1 

^ v m s(V) + s(Bi)(vi - v i+1 ) = s T w. 
i=i 

Thus s is optimal, if and only if the primal objective value s T w is equal to the optimal 
dual objective value XmcV ^A-^X^); and thus, if and only if there is equality in all above 
inequalities, hence the desired result. ■ 

Note that if v m = in Prop [9] (i.e., we take w G and there is a equal to zero), then 
the optimality condition is that for all i = 1, . . . , m — 1, s{A\ U • • • U Ai) = F(A\ U • • • U Ai) 
(i.e., we don't need that s(V) = F(V), i.e., the optimal solution is not necessarily in the 
base polyhedron). 

Proposition 10 (Maximizers of the support function of base polyhedron) Let F 

be a submodular function such that F(0) = 0. Let w G MP, with unique values V\ > ■ ■ ■ > 
v m , taken at sets A\, . . . ,A m . Then s is optimal for max^g^^) w T s if and only if for all 
i = l,...,m, s(Ai U •■ ■ U Ai) = F(A X U ••• LiAi). 
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Proof The proof follows the same arguments than for Prop. [9j 



Given the last proposition, we may now give necessary and sufficient conditions for charac- 
terizing faces of the base polyhedron. We first characterize when the base polyhedron B(F) 
has full relative interior. 

Definition 4 (Inseparable set) Let F be a submodular function such that F(0) = 0. A 
set A C V is said separable if and only there is a set B C A, such that B ^ 0, B ^ A and 
F{A) = F(B) + F(A\A). If A is non separable, A is said inseparable. 

Proposition 11 (Full-dimensional base polyhedron) Let F be a submodular function 
such that F(0) = 0. The base polyhedron has full relative interior if and only if V is not 
separable. 

Proof If V is separable into A and V\A, then for all s G B(F), we must have s(A) = F(A) 
and hence the base polyhedron is included in the intersection of two affine hyperplanes, i.e., 
B(F) does not have full relative interior in {s(V) = F(V)}. 

We now assume that B(F) is included in {s(^4) = F(A)}, for A as a non-empty strict 
subset of V. Then B(F) can be factorized in to B{Fa) x B(F a ) where Fa is the restriction 
of F to A and F A the contraction of F on A. Indeed, if s G B(F), then sa G B(Fa) 
because s(A) = F(A), and s v \ A G B(F A ), because for B C V\A, s v \ A (B) = s(B) = 
s(A U B) - s(A) ^ F(A U B) - F(A). Similarly, if s G B(F A ) x B(F A ), then for all set 
B c V, s(B) = s(A n B) + S((V\A) n B) ^ F(A n B) + F(A U B) — F(A) sC F(B) by 
submodularity, and s(A) = F(A). 

This shows that f(w) = fA{w A ) + / A K\ j4 ) 1 which implies that F{V) = F(A) + F(V\A), 
when applied to w = ly, i.e., V is separable. ■ 



We can now detail the facial structure of the base polyhedron, which will be dual to the one 
of the polyhedron defined by {w G MP, f(w) ^1} (i.e., level set of the Lovasz extension). 
As the base polyhedron B(F) is a polytope in dimension p — 1 (because it is bounded and 
contained in the affine hyperplane {s(V) = F(V)}), one can define a set of faces. Faces are 
the intersections of the polyhedron B{F) with any of its supporting hyperplanes. Supporting 
hyperplanes are themselves defined as the hyperplanes w T s = max seB ( f ) w T s = f(w) for 
w MP. From Prop. \10\ faces (which potentially empty relative interior) are obtained as 
the intersection of B(F) with s(A\ U • • • U AA = F(A\ U • • • U AA for an ordered partition 
of V . Together with Prop. HU we can now provide characterization of the faces of B(F). 

Proposition 12 (Faces of the base polyhedron) Let A± U • • • U A m be an ordered par- 
tition of V , such that for all j G {1, . . . ,m}, Aj is inseparable for the function Gj : B i-> 
F(A\ U • • • U Aj-i U B) — F{A\ U • • • U Aj-i) defined on subsets of Aj, then the set of bases 
s G B(F) such that for all j G {1, . . . , m}, s(A\ U • ■ • U A4) = F(A\ U ■ ■ ■ U A4) is a proper 
face of B(F) with non-empty relative interior. 

Proof We have a face from Prop. [TUl and it has non empty interior by applying Prop. [TT] 
on each submodular function Gj. ■ 
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The next proposition computes the Fenchel conjugate of the Lovasz extensions restricted 
to [0, l] p , noting that by Prop. [SJ the regular Fenchel conjugate of the unrestricted Lovasz 
extension is the indicator function of the base polyhedron (for a definition of Fenchel con- 
jugates, see [U ID]). This allows a form of conjugacy between set-functions and convex 
functions. 

Proposition 13 (Conjugate of a submodular function) Let F be a submodular func- 
tion such that F(0) = 0. The conjugate f : MP — > R of F is defined as f(s) = max^y s(A) — 
F(A). Then, the conjugate function f is convex, and is equal to the Fenchel- conjugate of the 
Lovasz extension restricted to [0, l] p . Moreover, for all A C V , F{A) = max se igp s(A) — f(s). 

Proof The function / is a maximum of linear functions and thus it is convex. We have for 
s G RP; 

max w T s — f(w) = m&xs(A) — F(A) = f(s) 
we[o,i]p Acv 

because F — s is submodular and because of Proposition which leads to first the desired 
result. The last assertion is a direct consequence of the fact that F{A) = /(l^). ■ 



4 Minimizers of submodular functions 

In this section, we review some relevant results for submodular function minimization (for 
which algorithms are presented in Section [8|). 

Proposition 14 (Property of minimizers of submodular functions) Let F be a sub- 
modular function such that F{0) = 0. The set A C V is a minimizer of F on 2 V if and 
only if A is a minimizer of the function from 2 A to R defined as B C A i— > F(B), and if 
is a minimizer of the function from 2 V \ A to R defined as B C V\A i— y F(B U A) — F(A). 

Proof The set of two conditions is clearly necessary. To show that it is sufficient, we let 
B C V, we have: F(A) + F(B) > F(A U B) + F(A nB)) F(A) + F(A), by using the 
submodularity of F and then the set of two conditions. This implies that F(A) ^ F(B), 
for all B C V, hence the desired result. ■ 



The following proposition provides a useful step towards submodular function minimization. 
In fact, it is the starting point of most polynomial-time algorithms presented in Section El 

Proposition 15 (Dual of minimization of submodular functions) Let F be a sub- 
modular function such that F{0) = 0. We have: 

min F(A) = max S-(V), (5) 

ACV sGB(F) 

where S- = min{s,0}. Moreover, given A C V and s € B{F), we always have F(A) ^ 
S-(V) with equality if and only if {s < 0} C A C {s ^ 0} and A is tight for s, i.e., 
s(A) = F(A). 



11 



We also have 

minF(A) = max s(V). (6) 

AcV sCP(F), s^O 

Moreover, given A C V and s € P(F) such that s ^ 0, we always have F(A) ^ s(V) with 
equality if and only if {s < 0} C A and A is tight for s, i.e., s(A) = F(A). 

Proof We have, by convex duality, and Props. [7] and 

min F(A) = min f(w)= min max w T s = max min w T s = max s_(V). 
AcV w€[Q,1]p w£[0,1]p sCB(F) sCB{F) w£[0,1]p sCB(F) 

Strong duality indeed holds because of Slater's condition ([0, l] p has non empty interior). 
Moreover, we have, for all A C V and s E B(F): 

F(A) ^ s(A) = s(A n {s < 0}) + s(A n{s> 0}) ^ s(A n {s < 0}) ^ s_(F) 

with equality if there is equality in the three inequalities. The first one leads to s(A) = F(A). 
The second one leads to An {s > 0} = 0, and the last one leads to {s < 0} C A. Moreover, 

max s(V) = max min s T ly — w T s = min max s T ly — w T s 

sGP(F), s^0 sGP(F) ™>0 w^0 s€P(F) 

= min /(ly — w) because of property (c) in Prop. [8] 

= min F(A) because of Prop. [71 
AcV 

Moreover, given s £ P(F) such that s ^ and 4 C V, we have: 

F(A) ^ s{A) = s(A n{s< 0}) ^ s(F) 
with equality if and only if A is tight and {s < 0} C A. ■ 



5 Operations that preserve submodularity 

In this section, we present several ways of building submodular functions from existing ones. 
For all of these, we describe how the Lovasz extensions and the submodular polyhedra are 
affected. Note that in many cases, operations are simpler in terms of polyhedra. 

Proposition 16 (Restriction of a submodular function) let F be a submodular func- 
tion such that F{0) = and A C V. The restriction of F on A, denoted Fa is a set-function 
on A defined as Fa{B) = F{B) for B C A. The function fA is submodular. Moreover, if 
we can write the Lovasz extension of F as f{w) = f(wA,wy\A), then the Lovasz extension 
of Fa is /a(^a) = f(wA,0). Moreover, the submodular polyhedron P{Fa) is simply the 
projection of P(F) on the components indexed by A, i.e., s € P(Fa) if and only ifBt such 
that (s,t) G P(F). 

Proof Submodularity and the form of the Lovasz extension are straightforward from def- 
initions. To obtain the submodular polyhderon, notice that we have fA(wA) = f(u>A,0) = 
max( s 4 ) g p(p) w\s + T i, which implies the desired result, this shows that the Fenchel- 
conjugate of the Lovasz extensions is the indicator function of a polyhedron. ■ 



12 



Proposition 17 (Contraction of a submodular function) let F be a submodular func- 
tion such that F(0) = and A C V. The contraction of F on A, denoted F A is a set- 
function on V\A defined as F A (B) = F(A U B) - F(A) for B C V\A. The function F A is 
submodular. Moreover, if we can write the Lovdsz extension of F as f{w) = f(wA,Wy\^), 
then the Lovdsz extension of F A is f A (w v \A) = /(lyi> w v\a) ~ Moreover, the sub- 

modular polyhedron P{F A ) is simply the projection of P(F) n {s(A) = F(A)} on the com- 
ponents indexed by V\A, i.e., t G P(F A ) if and only if 3s G P(F) n {s(A) = F(A)}, such 
that sy\A = t. 

Proof Submodularity and the form of the Lovasz extension are straightforward from 
definitions. Let t G IRl y \ A L If 3s G P(F) n {s(A) = F(A)}, such that s v \ A = t, then 
we have for all B C V\A, t(B) = t(B) + s(A) - F(A) < F(A U B) - F(A), and hence 
t G P(F A ). If t G P(F A ), then take any v G B(Fa) and concatenate v and t into s. Then, 
for all subsets C C V, s(C) = s{C n A) + s(C n (V\A)) = v{C n A) + t(C n (V\A)) ^ 
F{C nA) + F(A U (C n (V\>1))) - = F(C n A) + F(A U C) - F(A) < F(C) by 

submodularity. Hence s G P(F). 



The next proposition shows how to build a new submodular function from an existing one, 
by partial minimization. Note the similarity (and the difference) between the submodular 
polyhedra for a partial minimum (Prop. [T8l) and for the restriction defined in Prop. [TBJ 

Proposition 18 (Partial minimum of a submodular function) We consider a sub- 
modular function G on V U W , where V Cl W = (and \W\ = q), with Lovdsz extension 
g : W p+q -> R. We consider, for AcV, F(A) = min BcW G(A U B) - mm BcW G(B). The 
set-function F is submodular and such that F{0) = 0. Its Lovdsz extension is such that for 
allw G [0, l] p , f(w) = mm v( z[ i^q g(w,v) — min ve [ i]q g(0,v). Moreover, ifmiiiBcW G{B) = 
0, we have for all w G f{w) = min^^? g(w,v), and the submodular polyhedron P(F) 
is the set of s G W such that there exists t G R^_, such that (s,t) G P(G). 

Proof Define c = min^cW G(B), which is independent of A. We have, for A, A' C V, and 
any B, B' C W, by definition of F: 

F{A U A') + F(A n A') sC -2c + G([AuA']u[B'uB'}) + G([AnA'}u[B'nB'}) 

= -2c + G([A UB]U [A 1 UB'}) + G([A U6]n [A 1 U B'}) 
< -2c + G(A UB) + G(A' U B') by submodularity. 

Minimizing with respect to B and B' leads to the submodularity of F. 

Following Prop. [13j we can get the conjugate function / from the one g of G. For s G W, 
we have, by definition, f(s) = max^cy s(A) — F{A) = maxAuBcVuw s(A) + c — G(AUB) = 
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c + g(s, 0). We thus get from Prop. [13] that for w G [0, l] p , 
f(w) = max u) T s — f(s) 

s£RP 



maxw s — g(s,0) — c 



= max min w T s — w T s + g(w, v ) — c by applying Prop. [T3l 
= min maxw T s — w T s + u) — c 

(w,i))G[0,l] p+9 sgkp 

= min g(w,v) — c by maximizing with respect to s. 
ve[o,i]i 

Note that c = mmscw G(B) = min^^i]? g(0,v). 

For any w G R+, for any A ^ Halloo we have w/X G [0, l] p , and thus 

/(w) = A/ (w/X) = min Xgiw/X, v) — cA = min g(w,Xv) — cX 
ve[o,i]i ve[o,i]i 

= min gfto, u) — cA. 

t'G[0,A]9 v ; 

Thus, if c = 0, we have /(w) = min vgK ^ g(w,v), by letting A — >• +oo. We then also have: 

f(w) = min g(w,v) = min max w T s + v T t 
v£R q + v£R q + (s,t)eP(G) 

= max w T s. 

(s,t)£P{G), teRl 



The following propositions give an interpretation of the intersection between the submodular 
polyhedron and sets of the form {s ^ z} and {s ^ z}. 

Proposition 19 (Convolution of a submodular function and a modular function) 

Let F be a submodular function such that F(0) = and z G W. Define G(A) = 
minB(zAF{B) + z(A\B). Then G is submodular and the submodular polyhedron P{G) is 
equal to P(F) n {s ^ z). Moreover, for all AcV, G(A) < F(A) and G(A) < z(A). 

Proof Let A, A' C V, and B,B' the corresponding minimizers defining G(A) and G(A'). 
We have: 

G(A) + G{A') = F{B) + z{A\B)+F{B') + z{A'\B') 

> F(5 U B') + F(5 n B') + z(A\£) + z(A'\£') by submodularity 

= F(B U B') + F(B n 5') + z([A U A']\[5 U B'\) + z([A n A']\[fl n B']) 

^ G(A U A') + G(A n A') by definition of G, 

hence the submodularity of G. If s G P(G), then V8 C A C 7, s(A) < G(A) < + 
z(A\i?). From i? = A, we get that s G P{F); from B = 0, we get s ^ z, and hence 
s G P(F) n {s < z}. If s G P(F) n {s sC z}, for all \/B cAcV, s(A) = s{A\B) + s(B) 
z(A\I?) + F(B); by minimizing with respect to B, we get that s G P(G). 



14 



We get G(A) < F(A) by taking B = A in the definition of G(A), and we get G(A) ^ z(A) 
by taking B = 0. ■ 



Proposition 20 (Monotonization of a submodular function) Let F be a submodular 
function such that F{0) = 0. Define G(A) = mins^A F(B) — mm b cv F(B). Then G is 
submodular such that G(0) = 0, and the base polyhedron B(G) is equal to B(F) n {s ^ 0}. 
Moreover, G is non- decreasing, and for all A C V, G{A) ^ F(A). 

Proof Let c = min£ C y F(B). Let A, A' C V, and B,B' the corresponding minimizers 
defining G(A) and G(A'). We have: 

G(A) + G(A') = F(B) + F(B') — 2c 

> F(B U B') + F(B n B') - 2c by submodularity 
^ G(A U A') + G(A n A') by definition of G, 

hence the submodularity of G. It is obviously non-decreasing. We get G(A) ^ F{A) by 
taking B = A in the definition of G(^4). Since G is increasing, B{G) C (because 
all of its extreme points, obtained by the greedy algorithm, are in By definition of 

G, B(G) C B(F). Thus B(G) C B(F) n R p + . The opposite inclusion is trivial from the 
definition. 



6 Proximal optimization problems 

In this section, we consider separable convex functions and the minimization of such func- 
tions penalized by the Lovasz extension of a submodular function. When the separable 
functions are all quadratic functions, those problems are often referred to as proximal prob- 
lems (see, e.g., |17| and references therein). We make the simplifying assumption that 
the problem is strictly convex and differentiable (but not necessarily quadratic), but sharp 
statements could also be made in the general case. The next proposition shows that it is 
equivalent to the maximization of a separable concave function over the base polyhedron. 

Proposition 21 (Dual of proximal optimization problem) Let ipi, . . . ,tp p be p con- 
tinuously differentiable strictly convex functions on R, with Fenchel- conjugates ij)*, . . . , tp*. 
The two following optimization problems are dual of each other: 

p 

p 

max —y^ip*(—Si). (8) 

s£B(F) 3 

The pair (w, s) is optimal if and only if Sk = —ip' k (wk) for all k £ {1, . . . ,p}, and s £ B(F) is 
optimal for the maximization ofw T s over s € B(F) (see Prop. [W\for optimality conditions). 
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Proof We have: 



min f(w) + y^ ipi(wj 



P 



. max w ' s + y r ifij (wj ) 



tueKP s£B{F) 



max min w T s + ibAvjj) 



s£B(F) w€RP 



= max — > Si), 

s£B(F) f-f jV J " 

where ^ is the Fenchel-conjugate of (which may in general have a domain strictly in- 
cluded in R). Thus the separably penalized problem defined in Eq. ([7|) is equivalent to a 
separable maximization over the base polyhedron (i.e., Eq. ©). Moreover, the unique op- 
timal s for Eq. ([8]) and the unique optimal w for Eq. (|7|) are related through Sj = —ipj(uij) 
for all j € V. ■ 



For simplicity, we now assume that for all j € V, functions ipj are such that sup Qg jj ipj(a) = 
+00 and inf ae K (a) = —00. This implies that the Fenchel-conjugates are defined 
and finite on R. Following [16], we also consider a sequence of set optimization problems, 
parameterized by a € R: 

nun + (9) 

We denote by A a any minimizer of Eq. ([9]). Note that A a is a minimizer of a submodular 
function i* 1 + ijj'{a), where ijj'(a) 6 R p is the vector of components ip' k {a). 

The main property, as shown in [16], is that solving Eq. (|7|), which is a convex optimiza- 
tion problem, is equivalent to solving Eq. ([9]) for all possible a, which are submodular 
optimization problems. We first show a monotonicity property of solutions of Eq. ([9|). 

Proposition 22 (Monotonicity of solutions) If a > /3, then any solutions A a and A" 
of Eq. (0j for a and satisfy A a C A 13 . 

Proof We have, by optimality of A a and A": 

F(A a )+^2^(a) < F(A a uA^) + E V» 

F(A?)+J2i>j(P) < F(A a nA^)+ E ^'M 
jeAP j£A a nAP 

and by summing the two inequalities and using the submodularity of F, 

Ev>K«) + EW^ E E W. 

which is equivalent to YljeA a \AP (rfjiP) ~ V'j ( Q; )) ^ 0> which implies, since for all j € V, 
ipj(P) < V'j( a; ) (because of strict convexity), that = 0. ■ 



The next proposition shows that we can obtain the unique solution of Eq. ([7]) from all 
solutions of Eq. Q. 



16 



Proposition 23 (Proximal problem from submodular function minimizations) Given 
any solutions A a of problems in Eq. for all a G R, we define the vector u G R p as 

Uj = sup({a el, j G A Q }). 

Then u is the unique solution of the proximal problem in Eq. $^). 

Proof Because inf Qg K t/^ («) = — oo, for a small enough, we must have A a = V, and thus 
Uj is well-defined and finite for all j G V . 

If a > Uj, then, by definition of Uj, j ^ A a . This implies that A a C {j G V, Uj ^ a} = {u ^ 
a}. Moreover, if Uj > a, there exists /3 G (a,iij) such that j G ^4^. By the monotonicity 
property of Prop. l22| A@ is included in A a . This implies {u > a} C A a . 

We have for all w G M p , and /3 less than the smallest of iwj)- and the smallest of (uj)- : 



F({u^a})da + (F({u ^ a}) - F(V))da + 



3=1 



ip'Aa)da + ipj(/3) 



/■oo r P -i P 

C+ / ^ a}) + Vl 8 >4(a) da with C = / F(V)da + V ^(/3) 



/■OO ^ 

^ C+ F({w ^ a}) +J2 1 ^>^' j {a) 
J/3 L j=1 



da by optimality of A c 



f(w) +^2lpj(Wj). 



This shows that u is the unique optimum of problem in Eq. ([7]) . 



From the previous proposition, we also get the following corollary, i.e., all solutions of Eq. ([9]) 
may obtained from the single solutions of Eq. ([7]). 

Proposition 24 (Submodular function minimizations from proximal problem) If 

u is the unique minimizer of Eq. ^7ty, then for all a G M, the minimal minimizer of Eq. (OJ) 
is u > a and the maximal minimizer is {u ^ a}, that is, the minimizers A a are the sets 
such that {u > a} C A a C {u ^ a}. 

Given the previous propositions, we can solve a sequence of problems in Eq. 0, with 
decreasing a's, in order to obtain the unique minimizer w of Eq. ([7J). Note that because 
of the monotonicity, the sets A a can only increase. When a certain j G V enters A a , then 
Wj is exactly equal to the corresponding a. Once we know the largest values of w, we may 
redefine the problem by restricting on the unknown indices of w, which is valid for smaller 
values of a. 
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7 Optimization over the base polyhedron 



Optimization of separable functions over the base polyhedron has many applications, e.g., 
minimization of a submodular function (from Prop. [To"j) . proximal methods described in 
Section [6] (e.g., Prop [21]). In this section, we study these problems in more details. 

7.1 Optimality conditions 

We first show that when optimizing on the base polyhedron B(F), then one only needs to 
look at directions of the form 6k — 5 q for certain pairs (k, q), which will be said exchangeable 
(5k G K p is the vector which is entirely equal to zero, except a component equal to one at 
position k, which can also denote lm). 

Definition 5 (Tight sets) Given a base s G B(F), a set A C V is said tight if s(A) = 
F(A). 

Proposition 25 (Lattice of tight sets) If A and B are tight for s £ B(F), then An B 
and A U B are also tight for s. 

Proof We have: 

F(A(JB)+F(Ar\B) ^ s(AuB)+s(AnB) = s(A)+s(B) = F(A)+F(B) ^ F(AuB)+F(AnB). 

Thus there is equality everywhere, which leads to the desired result. Note that this shows 
that the set of tight sets for s G MP is a lattice. 



We now define the notion of exchangeable pairs, which we allow us to describe the tangent 
cone of the base polyhedron in Prop. [28j 

Definition 6 (Dependence function and exchangeable pairs) Given a base s G B(F) 
and k G A, the dependence function Dep(s, k) is the (non-empty) smallest tight set that con- 
tains k. If g G Dep(s,/c), then the pair (k,g) is said exchangeable. 

Prop. [231 shows that Dep(s, k) is indeed well-defined because V is tight and contains k, and 
the set of tight sets containing A; is a lattice. The following proposition details the most im- 
portant properties of exchangeable pairs, which are straightforward given the definition (in 
fact, the conjunction of these two properties is equivalent to the definition of exchangeable 
pairs). 

Proposition 26 (Properties of exchangeable pairs) Let s G B(F) and (k,q) is an 
exchangeable pair for s. Then: 

(a) there exists A C V such that k,q G A and A is tight for s, 

(b) if A <ZV is tight for s, then k G A q G A. 

The next proposition shows that only these exchangeable pairs need to be considered for 
checking optimality conditions for optimization over the base polyhedron. 
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Proposition 27 (Maximizers of support function of the base polyhedron) Let w G 

W. The base s G B(F) is a maximizer o/max sgB ( f ) s T w if and only for all k G V and 
q G Dep(s, k), Wk ^ u> g (i.e., for all exchangeable pairs). 

Proof If s is optimal, then if k (z V and q G Dep(s,fc), then for a > small enough, 
s' = s + a(5k — S„) is in B(F) (indeed, if A is not tight, then a small modification of s does 
not change the constraint, and if A is tight, if A 3 k, then q G A by Prop. [26] and thus 
s'(-A) = -^(^4); finally, if A tight and k ^ A, then s'(A) can only decrease). Optimality of s 
implies that Wk ^ w q . 

If the condition is true, we can order values of w, as wb 1 > • • • > WB m (where Wk = wb a 
for k G Bj). Let A,- = Si U ■ ■ ■ U Bj, so that k G Aj if and only if Wk ^ wb ■ This implies, 
because of the condition, that Aj = IJfceA Dep(s, fc), and thus that Aj is tight (as a union 
of tight sets), i.e., s(Aj) = F(Aj). Then, for any t G B(F), 



s T w 



t T w = ^2w k {s k -t k ) = ^2w B ,[s(Bi) -t(Bi)] 

k£V i=l 



= X>*J(s-i)(A)-(s-i)(A-i)] 

i=l 
m 

= ^^(F-iX^-OF-iX^x)] 

i=l 
i=l 

Thus s is optimal. Note that this also a consequence of Prop. [TUl 



From Prop. [271 we may now deduce the tangent cone of the base polyhedron, from which 
we then obtain optimality conditions. 

Proposition 28 (Tangent cone of base polyhedron) Let s G B(F), the tangent cone 
of B(F) at s is generated by vectors 5 k — 5 q for all k G V and q G Dep(s, k), i.e., for all 
exchangeable pairs (k,q). 

Proof Given the proof of Prop. [27] each of the vectors 5 k — S q belongs to the tangent cone. 
If the tangent cone strictly contains the conic hull of these vectors, by Farkas lemma (see, 
e.g., [Hi]), there exists y in the tangent cone and w G W, such that for all exchangeable pairs 
(k,q), w T (5 k — S q ) ^ and w T y > 0. By the last proposition, s is an optimal base for the 
weight vector w, however, s + ay G P{F) for a > sufficiently small and (s + ay) T w > s T w, 
which is a contradiction. ■ 



Proposition 29 (Optimality conditions for separable optimization) Let gj be con- 
vex functions on K, j = l,...,p. Then s G B(F) is a minimizer of X^eV 9j( s j) over 
s G B(F) if and only if for all exchangeable pairs (k,g), d + gk{sk) ^ d-g q (s q ), where 
d + gk{sk) is the right- derivative of g^ at Sk and d-g q (s q ) is the left- derivative of g q at s q . 

Proof This is immediate from Prop. [28] related to the tangent cone of B(F). ■ 
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We can give an alternative description of optimality conditions based on Prop. [TU1 which 
we give only for differentiable functions for simplicity. 

Proposition 30 (Alternative optimality conditions for separable optimization) Let 

gj be differentiable convex functions on R, j = 1, . . . ,p. Let s € B(F) and w £ R p defined 
as V7c € V,w k = g'k{ s k); define B(a) = {w ^ a} for a € R. Then, s is a minimizer of 
Y2jeV 9j( s j) over s £ B{F) if and only if for all a € R, i/ie sets B{a) are tight. 

Proof Note that the condition has to be checked only for a belonging to the of values 
taken by w. We consider the unique values v\ < ••• < v m , taken at sets A\,...,A m 
(i.e., V = A± U ■ ■ ■ U A m and Vfc G Ai, w k = vi). The condition then becomes that all 
Bi = A\ U • • • U Ai are tight for s. This is immediate from Prop. flOl Indeed, s is optimal if 
and only if s is optimal for the problem mm se B(F) w T s. ■ 



7.2 Lexicographically optimal bases 

We can give another interpretation to optimality conditions in Prop. (29) Given a vector 
s € R p , we denote by T(s) € MP, the sequence of components of s in order of increasing 
magnitude. That is, if ^ Sj 2 ^ • • • ^ Sj , then T(s) = (s^, . . . , Sj ). Given two vectors 
s and s' in R p , s is said lexicographically greater than or equal to s', if either (a) s = s', or, 
(b) s ^ s', and for the minimum index i such that Sj / s-, then Sj ^ ,s-. 

We now show that finding a base s £ B(F) that lexicographically maximizes the ordered 
vector of derivatives g' k {sk) is equivalent to minimizing ^2 k&v gk(sk) over the base polyhe- 
dron. Many algorithms for proximal problems are in fact often cast as maximization for 
such lexicographical orders (see, e.g. [T8]). 

Proposition 31 (Lexicographically optimal base) Letgj be differentiable strictly con- 
vex functions on R, j = l,...,p. Then s £ B(F) lexicographically maximizes the vec- 
tor T(g'(s)) = T[(g' 1 (si), . . . , g' p (sp))] over s € B(F) if and only if s is a minimizer of 
J2k£V 9k( s k) over the base polyhedron B(F). 

Proof First assume that s G B{F) lexicographically maximizes the vector T{g'{s)) = 
T[(g[(si), . . . , g'p(sp))} over s € B(F). Then, for any exchangeable pair (k, q) associated with 
s, we have that t = s + a(5k — d~ q ) £ B(F) for a sufficiently small (from Prop.l28l). Moreover, 
all components g'j(sj) are unchanged, except the k-ih and q-th position, fo which we have 
> 9k( s k) and g' q (t q ) < g' q {s q ). Thus, if g' k {s k ) < g' q {s q ), T(g'(t)) is lexicographically 
strictly greater than T(g'(s)), which is a contradiction. This implies that for all exchangeable 
pairs, g' k {sk) ^ g'{s q ), which implies, by Prop. [29] that s is indeed a minimizer. 

Let now s be a minimizer of ^fcev gk( s k) over the base polyhedron B(F). Let t be a base in 
B(F) such that T(g'(t)) is lexicographically greater than or equal to T(g'(s)). We consider 
v = g'(t) € W and w = g'(s) € R p . We denote by wb 1 < ■ ■ ■ < WB m the m distinct values of 
w € R p , taken on the subsets Aj, j = 1, . . . , m. From Prop. [TOl the sets Bj = A± U ■ ■ ■ U Aj 
are tight for s. We show by induction on j that for k G Bj, s k = t k , which will show that 
we must have s = t, and thus that T(g'(s)) is lexicographically optimal. 
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This is true for j = 0, and if we assume it is true for j, then, since T(v) is lexicographically 
greater than or equal to T(w), we have for all k £ Aj + \, Vk ^ Wk (since all the smaller 
ones are equal by the induction assumption), which implies, by strict convexity of gk that 
tk ^ Sfc. Moreover, since -Bj+i is tight, we have F(Bj + i) ^ t(Bj + i) ^ s(Bj + i) = F(Bj + i), 
which implies that tk = Sk for k £ -Aj+i- 



7.3 Optimization for proximal problems 

We can now obtain from the base polyhedron perspective the previous results linking prob- 
lems in Eq. (|7|) and Eq. Q, i.e., give an alternative proof of Prop. [23] from Section [6] 

Indeed, from Prop. [29] s is optimal for max seB ( f ) — Ylj=i ( — s j) ^ an d OIU y ^ f° r a ^ 
exchangeable pairs (k,q) for s, (^)'(- sj.) ^ (tp*)'(—s q ). If we denote Wk = (V'fc)^ - s fc) 
(which is equivalent to Sk = —^' k {wk)), then s is optimal if Wk ^ w q for all exchangeable 
pairs (k, q). 

Let a € M, we consider the optimization problem 

From Prop. [29] and the fact that the right-derivative of Sk i-> (s^. + ^(a))_ is —1 for 
Sfc < tp'k(a) and zero otherwise, and its left-derivative of Sk i-> (s& + is —1 for 

Sfc — ^fc(a) and zero otherwise, s is optimal if and only if for all exchangeable pairs (k, q) 
for s, we have l{ Sfe <— i// (a)} ^ l{s g ^—0'(a)}; which is equivalent to the fact that Sk < —ip' k {a) 
implies that s q ^ —ip' q (a). 

If s is optimal for Eq. ([8]), then (ipl)'(— Sfe) ^ {^)'{~ s g) f° r a ll exchangeable pairs. Thus, 
if s is optimal for Eq. ([8]) , then s is optimal for the maximization of Eq. (|10p for all a£l 

Finally, from Prop. [T5| solving Eq. (110p is equivalent to minimizing the submodular function 
F + ip'(a), which is exactly Eq. ([9]). Also, from Prop. [151 we have that any optimal A a 
satisfies {s + i^'(a) < 0} C A a C {s + ifj'(ct) ^ 0}- Moreover, since at the optimum, 
Wk + V4( s &) = 0, we thus have Sk + V'/U ) < if and only if Wk > a, and + ip' k ( a ) ^ if 
and only if Wk a. We thus get back Prop. [25] 



8 Submodular function minimization 

Several generic algorithms may be used for the minimization of a submodular function. 
They are all based on a sequence of evaluations of F(A) for certain subsets A C V. For 
specific functions, such as the ones defined from cuts, faster algorithms exist (see, e.g., 
[TIE] and Section MM- 

Note that maximizing submodular functions is a hard combinatorial problem in general. 
However, when maximizing a non-decreasing submodular function under a cardinality con- 
straint, the simple greedy method allows to obtain a (1 — l/e)-approximation |20j . 
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In this section, we first review classical approaches for sub-modular function minimization. 
The first approach presented in Section 18.11 is the most efficient in practice, but has no 
complexity bound. We briefly mention in Section 18.21 existing combinatorial algorithms 
with theoretical complexity bounds, but these are not used in practice. In Section [8.3| we 
consider certain submodular functions, so-called posimodular functions, for which simple 
combinatorial algorithms exist with better complexity. 

We then present algorithms which are based on a sequence of submodular function minimiza- 
tion, and that can be used for problems such as line search in the submodular polyhedron 
or proximal problems. 

8.1 Minimum-norm point algorithm 

From Eq. or Prop. I2H we obtain that if we know how to minimize f(w) + ^||?a>|||, ° r 
equivalently, minimize 9 II s III sucn that s G B(F), then we get all minimizers of F from the 
negative components of s. 

The minimum-norm point algorithm computes the minimum of \\sW2 for s G B(F). It uses 
an old algorithm from |21j that will find a minimum-norm base s G B(F) in a finite number 
of steps. This is made possible by the fact that we know how to efficiently maximize linear 
functions over B(F), where solutions are obtained by the greedy algorithm from Prop.0 

The complexity of each step of the algorithm is essentially 0(p) function evaluations and 
operations of order 0(p 3 ). However, there are no known upper bounds on the number of 
iterations. 

Note that once we know which values of the optimum values s should be equal, greater or 
smaller, then, we obtain in closed form all values. Indeed, let c\ < C2 < • • • < c m the m 
different values taken by s (or w), and A{ the corresponding sets such that = Cj for 
k G Aj. We then have: 

_ /(AiU---UAj)-/(AiU---UA,--i) 
Cj ~ \A-\ 

which allows to compute the values Cj knowing only the sets Aj. 

8.2 Combinatorial algorithms 

Algorithms are based on Prop. [El i.e., on the identity min^ c y F{A) = max seB (^) S-(V). 
Combinatorial algorithms will usually output the subset A and a base s G B(F) such that 
A is tight for s and {s < 0} C A C {s ^ 0}, as a certificate of optimality. 

Most algorithms, will also output the largest minimizer A of F, or sometimes describe the 
entire lattice of minimizers. Best algorithms have polynomial complexity [221 1231 f24] . but 
still have high complexity (typically 0(p 6 ) or more). 
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8.3 Minimizing posimodular functions 

A submodular function F is said symmetric if for all B C V, F(V\B) = F(B). By applying 
submodularity, get that 2F(B) = F(V\B) + F(B) ^ F(V) + F(0) = 2F(0) = 0, which 
implies that F is non- negative. Hence its global minimum is attained at V and 0. 

Such functions can be minimized in time 0(p 3 ) over all non-trivial (i.e., different from 
and V) subsets of V [25]. Moreover, the algorithm is valid for the regular minimization of 
posimodular functions [26], i.e., of functions that satisfies 

VA, B C V, F(A) + F(B) ^ F(A\B) + F(B\A). 

These include symmetric submodular functions as well as modular functions, and hence the 
sum of any of those (in particular, cuts with sinks and sources, as presented in Section ri0.2l) , 

8.4 Line search in submodular polyhedron 

The general line search problem in the submodular polyhedron amounts to start from 
s G P(B) and search on the direction t G W, i.e., find the maximal A ^ such that 
s + Xt G P(F), which is equivalent to Xt G P(F — s). Note that since s G P(F), F — s is 
submodular and non-negative. 

We thus now assume that F is non-negative and that s = 0. Given t G W, we consider 
the problem of finding the largest A ^ such that Xt G P(F). We denote by [i the optimal 
value (which is finite, as soon as there is at least one tk > 0, which we assume). We have 
A ^ (j, if and only if g(X) = min^cy F(A) — Xt(A) ^ 0. More precisely, g(X) ^ if and only 
if Xt G P(F). Moreover, g(0) = and g is non-increasing, which implies that g is zero on 
[0, fj] and then strictly negative. 

We thus need to find the zero of the function g(X), which is piecewise affine. This can be 
done with the secant method, once we have a A > such that g(A) < 0. Such a A can be 
obtained by noting that P(F) is included in {s, Vfe G V, s& ^ F({k})}, which implies that 
if A > min feey then g(X) < 0. 

The secant method is simply starting with a A such that g(X) > 0, and then find the 
minimizer A in the definition of g(X), and set A = F(A)/t(A), and start again in g(X) < 
(see [27] for more details). Note that if the minimum- norm point algorithm is used for 
submodular function minimization, then we obtain instead a minimizer of w i— > f(w) — 
Xw T t+ 2IMI2J an d we can also update A as A = (f(w) + llMl!)/^ 1 "*)- 

8.5 Homotopy method for proximal problems 

We review in Section 18.51 and Section 18.61 two strategies for maximizing separable concave 
functions on the base polyhedron. One strategy is based on the equivalence with the se- 
quence of minimizations of submodular functions (Prop. |23|) . The other one is based on a 
decomposition strategy. 

The first method is based on the fact that if a is large enough, then A a = is optimum 
for Eq. (|9]). From Prop. prop:dualmin, this is valid as long as G P(F + ?//(«)), i.e., 
—ip'(a) G P(F). The minimum a£i such that this is valid can be obtained by line search. 
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Once the minimal a is found, and A is the maximal tight set associated with —ip'{a), 
then if A = V, w = aly. Otherwise, we let wa = ol\a, and in order to determine uy\A 
we recursively apply the same procedure to the function Fy\A '■ 2 V \ A — > K, defined as 
F V \ A {B) = F(B) (i.e., restriction of F to V\A). 

This algorithm, adapted from [28] (see also Sec. 9.2]), requires to be able to find the 
minimum a such that —ip'(a) 6 P(P)- This may be done as follows (same procedure as in 
Section [8.41 but extended to non quadratic functions). 

Consider g(a) = min^y F(A) + ip'{a){A). The function is piecewise smooth and strictly 
increasing. It is equal to zero if and only if —ip'(a) £ P(F), and it is strictly negative 
otherwise. We start with a point ao such that g(ao) < 0, we let Aq be a minimizer in the 
definition of g(ao). We find the unique a± such that F(A) + tp' \ai)(Ao) = and we start 
again, until we have g{a\) = 0. 

In order to find oeo such that g(cto) < 0, we use the fact that P(F) C Ilkel^ - °°> ^({^})]' 
and thus if there exists k G V, ^' k {a) > —F({k}), then —ip'(a) £ P(F). We can thus 
consider a = nmi fcg v'(Vi)~ 1 (--F 1 (W))- 

8.6 Decomposition algorithm for proximal problems 

We adapt the algorithm of |29j and Sec. 8.2]. Note that it can be slightly modified for 
problems with non-decreasing submodular functions [29] (see also Section [9]) . 

For simplicity, we consider strictly convex differentiable functions gj, j = 1, . . . ,p, and the 
following algorithm: 

1. Find the unique minimizer t € MP of ^2j^y 9j(tj) such that t(V) = F(V). 

2. Minimize the submodular function F — t, i.e., find the largest A C V that minimizes 
F(A)-t(A). 

3. If A = V, then t is optimal. Exit. 

4. Find a minimizer sa of ^2j^A9j( s j) over s m ^ ne base polyhedron associated to Fa, 
the restriction of F to A. 

5. Find a minimizer sy\A of X^eVYA 9j( s j) over s m * ne base polyhedron associated to 
the contraction F A of F on A, defined as F A (B) = F(A U B) - F(A). 

6. Concatenate sa and sy\A- Exit. 

The algorithm must stop after at most p iterations. Indeed, if A ^ V in Step 3, then we 
must have A ^ (indeed, A = implies that t £ P(F), which in turns implies that A = V 
because by construction t(V) = F(V), which leads to a contradiction). Thus we actually 
split V into two non-trivial parts A and V\A. 

We now need to prove optimality. Let s be the output of the algorithm. We first show that 
s £ B(F). We have for any B CV: 

s(B) = s(B nA) + s(B n (V\A)) 

< F(B HA) + F(A LIB)- F(A) by definition of s A and s v \ A 
^ F(B) by submodularity. 
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Thus s is indeed in the submodular polyhedron P(F). Moreover, we have s{V) = sa(A) + 
sv\a(V\A) = F(A) + F(V) - F(A) = F(V), i.e., s is in the base polyhedron B{F). 

We now construct a second base s G B(F) as follows: sa is the minimizer of X^eA5j( s j) 
over s in the base polyhedron associated to the submodular polyhedron P(Fa) n {sa $5 £a}- 
From Prop. [El the associated submodular function is Ha{B) = minccB F(C) + t(B\C). 
We have -Ha(^4) = m i n ccA P(C) — t(C) + i(A) = F(A) because A is the largest minimizer 
of F — t. Thus, the base polyhedron associated with Ha is simply B(Fa) H {sa ^ 4a}- 
Moreover, from Prop. HH we have that Ha ^ i*Aj and thus if sa is tight for Fa then sa is 
tight for Ha- 

Morover, we define s~v\A as the minimizer of ^jev\A 9j ( s j) over * ne base polyhedron B(J A ) 
where we define the submodular function J A on V\A as follows: J A (B) = minces F(C U 
A) — F(A) — t(C) +t(B) . Then J A — t is non-decreasing and submodular (by Proposition I20|) . 
Moreover, J A (V\A) = F(V) - F(A) and J A F A . Finally £(F A ) n {s v \ A > t v \ A } = 
B(J A ) and thus if sa is tight for F A then sa is tight for J A . 

We now show that s is optimal for the problem. Since s has a higher objective value than s, 
the base s will then be optimal as well. If we take an exchangeable pair (k, q) for s. Then, 
we have several cases (note that A is tight for s): 

• k 6 A, implies q € A (by Prop. [26l since A is tight), and thus the optimality condition 
stems from the sub-problem on A (since being tight for Fa implies being tight for H) 

• k £ A, q € A, it comes from sa ^ £a an d s~v\A ^ ty\Ai which implies g' k (s~k) ^ 9q(sq) 
(since all g' k {tk) are equal by definition of t). 

• k ^ A, q ^ A, it comes from the optimality of the subproblem on V\A, (since being 
tight for F A implies being tight for J A ). 

In all cases, for exchangeable pairs (k,q), we have g k {s~k) ^ 9 q (sq) an d thus, by Prop. [29l 
s is optimal and hence s is optimal. Note that we could also have used Prop [30] to show 
optimality. 

Note finally that similar algorithms may be applied when we restrict s to be integers (see, 
e.g., [291 [6]). 

9 Polymatroids (non-increasing submodular functions) 

When the submodular function F is also non- decreasing, i.e., when for A, B C V, A C B 
F(A) ^ F(B), then a truncated greedy algorithm may be applied for all linear functions 
(i.e., with potentially negative coefficients). Such non-decreasing and submodular functions 
are often referred to as polymatroid set-functions or (3-functions [30]. Note that in this 
situation, the Lovasz extension is non-decreasing with respect to all components, i.e., if 
w ^ w', then f(w) ^ f(w'). 

Proposition 32 (Truncated greedy algorithm) Assume F is submodular and non- decreasing. 
Let w G W; a maximizer of max seP ^ F ^ s ^ Q w T s may be obtained by the following algo- 
rithm: order all the strictly positive components of w, as Wj 1 ^ • • • ^ Wj m > and de- 
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fine Sj k = F({ji, . . . ,jk}) — F({ji, . . . ,jk-i}) for k ^ m, and zero otherwise. Moreover, 
max sgP(f)i s ^oW T s = f{w + ). 

Proof The proof is similar to that of Prop. [5j The constraint uik = X^Asfc ^ A ^ s s i m Pby 
replaced by tuj. ^ YliA^k (because of the new constraint s ^ 0). The vector s is then 
feasible because of the monotonicity of F. ■ 



We can also specialize several other results to polymatroids. In this setting, it is easy to 
see that the base polyhedron B(F) is included in positive orthant (this is for example a 
consequence of the greedy algorithm from Prop. [5]). However, P{F) is not included in the 
positive orthant, and it is common to consider the positive polyhedron 

P+(F) = P{F) nm. p + = {s > 0, VA c V,s(A) < F{A)}, 

which is compact (while P(F) is never, as it is unbounded). 

We now extend Prop. [10] and Prop. [9] related to support functions, to the independence 
polyhedron P + (F), as well as proposition Prop. [T2l related to faces of the polyhedron. 

Proposition 33 (Maximizers of the support function of independence polyhedron) 

Let F be a non- decreasing submodular function such that F{0) = 0. Let w G W, with unique 
values v± > ■ ■ ■ > v m , taken at sets A\,..., A m . Then s is optimal for max s6 p n?) w T s if and 
only if for alii = 1, . . . , m, U; < sj^ = 0, andv t ^ => s{A\\J- ■ -Li Ai) = F(AiU- ■ -UAi). 

Proof The proof follows the same arguments than for Prop. [9j with a special treatment 
for the negative values of w. ■ 



Proposition 34 (Faces of the independence polyhedron) Let F be a non- decreasing 
submodular function such that F(0) = 0. Let B be a stable set (i.e., such that all strict 
larger subsets have strictly greater function values), and A\ U • • • U A m an ordered partition 
of B, such that for all j G {1, . . . , m}, Aj is inseparable for the function Gj : B i— > F(A\ U 
• • • U Aj-\ U B) — F(A\ U • • • U Aj_i) defined on subsets of Aj, then the set of s G P+(F) 
such that for all j G {1, . . . , m}, s(A\ U • • • U Ai) = F{A\ U • • • U Ai), and s v \ B = 0, is a 
proper face of P+{F) with non-empty relative interior. 

Proof We have a face from Prop. l33l and it has non empty interior by applying Prop. [TT] 
on each submodular function Gj, and using the stability of B. ■ 



We now show how to minimize a separable convex function on the submodular polyhedron 
or the positive submodular polyhedron (rather than on the base polyhedron). We first show 
the following proposition for the submodular polyhedron of any submodular function (non 
necessarily non-decreasing) . 

Proposition 35 (Separable optimization on the submodular polyhedron) Assume 
that F is submodular. Let ipj, j = 1, . . . ,p be p convex functions such that ip* is defined 
and finite on R. Let (v, t) be a primal-dual optimal pair for the problem 

min max t T v + V] ipk{vk) = min f(v) + V] ipk(vk) = max - V" ij)^(—t k ). 
v ; kev kev v ' k&v 
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For k EV, let Sk be a maximizer of — Sfc) on (— oo,ifc]. Define w = v+. Then (w,s) is 
a primal-dual optimal pair for the problem 

min max s T w + >^ ibk(wk) = min f (w) + tbu(wk) = max — >^ il>Z(— Sh)- 
^seP( F) ^ W ^ + J{ ^/ k{ seP( F) ^ 

Proof The pair (w, s) is optimal if and only if w^Sk + iftk(. w k) + V'fcC - s k) = 0, i.e., (wk, Sk) 
is a Fenchel-dual pair for tpf-, and f(w) = s T w. The first statement is true by construction 
(indeed, if Sk = tk, then this is a consequence of optimality for the first problem, if Sk < tk, 
then w k = (ipl)'(-Sk) = 0). 

For the second statement, notice that s is obtained from t by keeping the components 
of t corresponding to strictly positive values of v (let K denote that subset), and lower- 
ing the ones for V\K. For a > 0, the level sets {w ^ a} are equal to {v ^ a} C K. 
Thus, by Prop. [TUJ all of these are tight for t and hence for s because these sets are included 
in K, and sk = tx- This shows, by Prop.[9l that s G P{F) is optimal for max sg p( F ) w T s. ■ 



Note that Prop. [35] involves primal-dual pairs (w,s) and (v,t), but that we can define w 
from v only, and define s from t only; thus, primal-only views and dual-only views are 
possible. This also applies to Prop. EH 

Proposition 36 (Separable optimization on the positive submodular polyhedron) 

Assume that F is submodular and non-increasing. Let ipj, j = 1, . . . ,p be p convex func- 
tions such that i/j* is defined and finite on M.. Let (v, t) be a primal-dual optimal pair for 
the problem 

min max t T v + it>k{vk) = min f(v) + ibk(vk) = max — >^ "4>t(—tk)- 

K ' k€V k€V y ' keV 

For k € V, let Sk be a maximizer of —ip^{—Sk) on [0, For all k, define Wk through 
s k + = 0. Then (w, s) is a primal-dual optimal pair for the problem 

min max s T w + V] ipk(w k ) = min f{w+) + V] ipk(w k ) = max - V] ipl(-s k ). 

Proof We first apply Prop 1351 to the convex functions tpk(wk) = min,,,.^,^ ipk(vk), which 
Fenchel-conjugates equal to ipl(sk) if Sfc ^ and +oo otherwise. We obtain the mini- 
mum over of f(w) + Yljev' l l J k( w k)- Since / non-decreasing with respect to each vari- 
able taken separately (because F is non-decreasing), it is equivalent to minimizing on R p , 
min^eRp f(w+) + J2kev ^k(wk)- ■ 



10 Examples of submodular functions 

We now present classical examples of submodular functions. For each of these, we also 
describe the corresponding Lovasz extensions, and, when appropriate, the associated sub- 
modular polyhedra. 
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10.1 Cardinality-based functions 



We consider functions that depend only on s(A) for a certain s G K+. If s = ly, these are 
functions of the cardinality. The next proposition shows that only concave functions lead to 
submodular functions, and is coherent with the diminishing return property from Section Q] 
(Prop. [I]). 

Proposition 37 (Submodularity of cardinality-based set-functions) If s G R+ and 

g : R + — > R is a concave function, then F : A i-» g(s(A)) is submodular. If F : A i-> <7(s(^4)) 
is submodular for all s G R^J., i/ien 5 is concave. 

Proof The function F : i 4 <?(s(j4)) is submodular if and only if for all A C V and 
G V\A: g(s(A) + s k ) - g(s(A)) ^ g(s(A) + s k + Sj ) - g(s(A) + Sj ). If g is con- 
cave and a ^ 0, i H' g(a + 1) — g{t) is non-increasing, hence the first result. Moreover, if 
t i—T- g(a+t)—g(t) is non-increasing for all a ^ 0, then g is concave, hence the second result. ■ 



Proposition 38 (Lovasz extension of cardinality-based set-functions) Let s G R+ 

and g : R+ — > R 6e a concave function such that g(0) = 0, i/ie Lovasz extension of the 
submodular function F : A^- g(s(A)) is equal to 

p 

/H = Yl w 3m +■■■ + s jk ) ~ g{s h +■■■ + s jk _A]. 

k=l 

Ifs = l v , i.e., F(A)=g(\A\), then f(w) = Y, P k =i w iMk) - g(k - 1)]. 
The Lovasz extension is thus a function of order statistics. 



10.2 Cut functions 

Given a set of (non necessarily symmetric) weights d : V x V — >• R+, define 

F(A)= 

keA,jev\A 

which we denote d(A,V\A). Note that for a cut function and disjoint subsets A,B,C, we 
always have: 

F(AUBUC) = F(AuB) + F(AuC) + F(BuC)-F{A)-F(B)-F(C)+F{0) 
F(A U B) = d(AuB,(AuB) c ) = d(A,A c nB c ) + d(B,A c nB c ) 

< d{A,A c ) + d{B,B c ) = F(A) + F(B), 

where we denote A c = V\A. We then have, for any sets A, B C V: 
F(AUB) = F([AnB]U[A\B]U[B\A]) 

= F{[AnB]U [A\B]) +F([inB]U [B\A]) + F([A\B] U [B\A]) 

—F(A n B) - F(A\B) - F(B\A) + F(0) 
= F(A) + F(B) + F(AAB) - F(A D B) - F(A\B) - F(B\A) 
= F(A) + F(B) - F(A HB) + [F(AAB) - F(A\B) - F(B\A)] 

< F(A) + F(B) — F(A n B), 



28 



CM~M><) 

TTTT 

TTTT 
CM><><) 



Figure 2: Two-dimensional grid with 4-conenctivity. 



which shows submodularity. Moreover, the Lovasz extension is equal to 



!(w) 



d(k,j)(w k - Wj). 



Then, if the weight function d is symmetric, then the submodular function is also symmetric 
and the Lovasz extension is even (from Prop. [4]). Examples of such cuts are shown in Figure [3] 
(left and middle). A instance of these Lovasz extensions plays a crucial role in signal and 
image processing; indeed, for a graph composed a two-dimensional grid with 4-connectivity 
(see Figured]), we obtain the total variation. In fact, some of the results presented in this 
tutorial were first tackled on this particular case (see, e.g., [16] and references therein). 

We can also consider partial minimization to obtain "regular functions" [5]. Examples lead 
to f(w) = ma,x keG w k - mm keG w k , which corresponds to F(A) = l^nG^ ~ L4nG=0- 

It may also lead to "noisy cuts", i.e., for a given a weight function d : V x V —> K+, we 
add p nodes, each of them associated to the original nodes, and consider the convex and 
submodular functions 



F(A) 



mm y 

kjev 



mm 

BcV 



d(k,j)(v k - vj) + + A ^ a k \v k - w k \, 
kev 

d(k,j) + a k \l heA - l keB \, 

keB,jeB c k&V 

which are associated to each other due to Prop. [T8l An example of such cut is shown in 
Figure [3] (right). 

This example is particularly interesting, because it leads to a family of submodular functions 
for which dedicated fast algorithms exist. Indeed, minimizing the cut functions or the 
partially minimized cut, plus a modular function defined by z € MP, may be done with a 
min-cut/max-ffow algorithm (see, e.g., [M]). Indeed, following [5] [16], we add two nodes to 
the graph, a source s and a sink t. All original edges have non- negative capacities d(k,j), 
while, the edge that links the source s to the node k £ V has capacity (z k )+ and the edge 
that links the node k £ V to the sink t has weight —(z k )- (see bottom line of Figure [3]). 
Finding a minimum cut or maximum flow in this graph leads to a minimizer of F — z. 

For proximal methods, such as defined in Eq. ([9]) (Section [6]), we have z = ip{a) and we need 
to solve an instance of a parametric max-flow problem, which may be done using efficient 
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Figure 3: Top: graphs for symmetric (left) and non-symmetric cost functions. Bottom: cor- 
responding networks (note that for the right plot, this corresponds to a partial minimization, 
we refer to in the text as noisy cuts). 

dedicated algorithms [HI [H [16] . See also Section 18.51 for generic algorithms based on a 
sequence of singular function minimizations. 

10.3 Set covers 

Given a non-negative function D : 2 V — > R + , then we can define 

F(A) = De P( G )< 

GcV,GnA^0 

with f(w) = J2gcV Dep(G) max^gc Wk- The submodularity and the Lovasz extension can 
be obtained using linearity and the fact that the Lovasz extension of A h-> lcnA=0 is 
w i->- m&x k< z G w k . 

Mobius inversion. Note that any set-function F may be written as 

f(a) = Yl D °p( G ) = E De p( G ) - E De p( G )' 

GcV,GnA^0 GCV GcV\A 

for a certain set-function D, which is not usually non-negative. Indeed, by Mobius inversion 
formula (see, e.g., [32]), we have: 

Dep(G)= Y(-l) lGHAl [F(V)-HA)]. 

AcG 

Thus, functions for which D is non-negative are a specific subset of submodular functions. 
Moreover, these functions are always non-decreasing. Such functions are used in the context 
of sparsity- inducing norms [H [33l [M] . 
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Figure 4: Left: Groups corresponding to a hierarchy. Right: network flow interpretation of 
same submodular function. 

Reinterpretation in terms of set-covers. Let W be any "base" set. Given for each 



k € V, a set S k C W, we define F(A) 



U 



More generally, we can define 



F{A) = Yljew ^(j)l3fc6A,s fe 9j ; if we have weights A(j) € K + for j £W (this corresponds to 
replace the cardinality function on W, by a weighted cardinality function, with weights A). 
Then, F is submodular (as a consequence of the equivalence with the previously defined 
functions, which we now prove). 

These two types of functions are in fact equivalent. Indeed, for a weight function D : 2 V — > 
R+, we let W = 2 V and S k = {G C V, G 3 k}, and A(G) = Dep(G), to obtain a set cover. 

For a certain set cover define by W, C W, k € V, and 5, define 

Dep(G) = ^A,l Gj=u ^ Sfc3jSfc , 
jew 

to obtain a set-function expressed in terms of groups and non-negative weight functions. 



Examples. In Figure SI we show a set of groups (i.e., only the groups G <ZV for which 
Dep(G) > 0), which can be embedded into a hierarchy, as well as the corresponding flow 
interpretation from Section 110.41 We also show in Figure [5] and Figure [6] examples in one 
dimension. 



10.4 Flows 

Following |18j . we can obtain a family of non-decreasing submodular set-functions (which 
include set covers) from multi-sink multi-source networks. We define a weight function on 
a set W, which includes a set S of sources and a set V of sinks (which will be the set on 
which the submodular function will be defined). We assume that we are given capacities, 
i.e., a function c from W x W to R+. For all functions (p : W x W — > R, we use the notation 
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A flow is a function ip : W x W — > such that (a) ip ^ c for all arcs, (b) for all 
w G W\(5 U V), the net-flow at it?, i.e., tp(W, {w}) — ip({w}, W), is null, (c) for all sources 
s G S, the net-flow at s is non-positive, i.e., p(W, {s}) — (p({s},W) ^ 0, (d) for all sinks 
t G V, the net-flow at t is non-negative, i.e., (p(W, {t}) — (p({t},W) ^ 0. We denote by F 
the set of flows. 

For A C V (the set of sinks), we define 

F(A) = max tp(W, A) - <p(A, W), 

which is the maximal net-flow getting out of A. From the max-flow/min-cut theorem (see, 
e.g., [31]), we have immediately that 

F(A)= min c(X,W\X). 

xew, sex, a<zw\x 



One then obtain that F is submodular (as the partial minimization of a cut function) and 
non-decreasing by construction. One particularity is that for this type of submodular non- 
decreasing functions, we have an explicit description of the positive submodular polyhedron. 
Indeed, x G belongs to P(F) if and only if, there exists a flow (p G T such that for all 
k G V, Xk = ip(W, {k}) — <p({k}, W) is the net-flow getting out of k. 

Similarly to other cut-derived functions, there are dedicated algorithms for proximal meth- 
ods and submodular minimization [35]. See also [33] for applications to sparsity- inducing 
norms. 



Flow interpretation of set-covers. Following [34] . we now show that the submodular 
functions defined in this section includes the ones defined in Section flO.31 Indeed, consider a 
non-negative function D : 2 V — > E + , and define F(A) = YIgcVGhA^z Dep(G). The Lovasz 
extension may be written as, for all w G M+, 

f( w ) = Dep(G)maxwfc 
Gcv 



T G 

max w t 



tG£R p + , t£ xo =0, t G (G)=Dcp(G) 

max w T t G 

t G m p + , ^ XG =0, i G (G)=Dep(G) 



= max > ( > t k )wk- 

* G en- *v\g=°> * G (G)=Dc P (G) ^ V £y J 

Thus s G P(F), if and only there exists t G G M. p + , & G = 0, t G {G) = Dep(G) for all 

G C V, such that s = Y^GcV^ G - This can be given a network flow interpretation on the 
graph composed of a single source s, one node per subset G C V such that Dep(G) > 0, and 
the sink set V. The source is connected to all subsets G, with capacity Dep(G), and each 
subset is connected to the variables it contains, with infinite capacity. We give examples of 
such networks in Figure [5] and Figure EJ 
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10.5 Entropies 



Given p random variables X±, . . . ,X p which all take a finite number of values, we define 
F(A) as the joint entropy of the variables (X k ) keA . This function is submodular because, if 
Ac B and k ^ B, F(AU{k})—F(A) = H(X A ,X k )-H(X A ) = H(X k \X A ) > H(X k \X B ) = 
F(B U {k}) — F(B) (by the data processing inequality [36]). 

This can be extended to any distribution by considering differential entropies. One applica- 
tion is for Gaussian random variables, leading to the submodularity of the function defined 
through F(A) = log det Q AA , for some positive definite matrix Q € M pxp (see further related 
examples in Section [10. 6|) . 

10.6 Spectral functions of submatrices 

Given a positive semidefinite matrix Q € M. pxp and a real-valued function h from — >• R, 
one may define tic[h(Q)] as Yli=i h(\i) where Ai, . . . , X p are the (nonnegative) eigenvalues 
of Q [37]. We can thus define the function F(A) = trh(Q AA ) for A C V. 

The concavity of h is not sufficient for submodularity (as can be seen by generating random 
examples with h(X) = A/(A + 1)). 

We know however that the functions h(X) = log(A+t) for t ^ lead to submodular functions; 



thus, since for p G (0, 1), A? = ™^ / °° log(l + X/t^P^dt (see, e.g., [38]), h(X) = X p for p € 



(0, 1] are positive linear combinations of functions that lead to non-decreasing submodular 
set-functions. We thus obtain a non-decreasing submodular function. Applications may be 
found in [3]. 

This can be generalized to functions of the singular values of X(A, B) where X is a rectan- 
gular matrix, by considering the fact that singular values of a matrix X are related to the 



10.7 Best subset selection 

Following [3D], we consider p random variables (covariates) X\, . . . ,X p , and a random re- 
sponse Y with unit variance, i.e., var(y) = 1. We consider predicting Y linearly from X. 
We consider F(A) = vai(Y\X A ). The function F is a non-increasing function. 

A variable Xj is a suppressor for variable X{, if |Corr(Y,Xj|Xj)| > |Corr(Y,Xj)|. Follow- 
ing [30], we assume that there are no suppressor variables given any set A, i.e., we assume 
that for all A C V, i,j £ A, 



This implies that F is supermodular. Note however that the condition on suppressors is 
rather strong. 




CoTr{Y,Xi\Xj,X A )\ < \Caa{Y,Xi\X A )\ 



We then have: 



var(y|X A ,X fc ) - var(y|X A ) = -Corr(y, X k \X A f , 
vat<y\X A ,Xj,X k )-v*x<Y\X A ,Xj) = -Corr(y, X k \X A , Xj) 2 . 
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10.8 Matroids 



Given a set V, we consider a family X of subsets of V such that (a) € X, (b) I\ c J2 G 
1 ^ h el, and (c) for all 7i,I 2 € X, jii| < |J 2 | 3fc £ J 2 \/i, A U {k} € X. The pair 
(V,X) is then referred to as a matroid, with X its family of independent sets. Then the rank 
function of the matroid, defined as p(A) = max/ C A i agi is submodular. 

The classical example is the graphic matroid; it corresponds to V being an edge set of a 
certain graph, and X being the set of subsets of edges which do not contain any cycle. The 
rank function p(A) is then equal to p minus the number of connected components of the 
subgraph induced by A. 

The other one is the linear matroid. Given a matrix M with p columns, then a set / is 
independent if and only if the set of columns indexed by / is independent. The rank function 
p(A) is then the rank of the columns indexed by A (this is also an instance of functions 
from Section [I0.6p . 
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